Web Information Segmentation Method Based on DOM Structure Tree

doi:10.3969/j.issn.1006-2475.2013.10.056

Computer and Modernization ›› 2013, Vol. 218 ›› Issue (10): 229-232.doi: 10.3969/j.issn.1006-2475.2013.10.056

• 网络与通信 • Previous Articles Next Articles

Web Information Segmentation Method Based on DOM Structure Tree

ZHOU Jian¹, TANG Jin^1,2, LUO Bin^1,2

1. School of Computer Science and Technology, Anhui University, Hefei 230601, China; 2. Key Lab of Industrial Image Processing & Analysis of Anhui Province, Hefei 230039, China

Received:2013-05-23 Revised:1900-01-01 Online:2013-10-26 Published:2013-10-26

Abstract

Abstract:

Correct extraction and segmentation of Web information is significant to text information mining. The paper proposes and achieves a method which can get informative information from Web page and be able to follow the correct segmentation of the original text. The method first uses page layout tag <table> and <div> to build a DOM structure tree, and then uses the nested relations of the layout label, that the DOM structure tree reflects to choose the content blocks, extract text information correctly, and finally achieves information segment of the body through the manipulation of some special tags. The experimental results prove that this method is easy to realize and high efficiency and it can automatically extract informative message and section accurately.

Key words: semantic markup, layout label, segmentation, noise

CLC Number:

TP393

ZHOU Jian;TANG Jin;LUO Bin;. Web Information Segmentation Method Based on DOM Structure Tree[J]. Computer and Modernization, 2013, 218(10): 229-232.

[1]	WANG Haiyang, GONG Tongxin, YANG Jintao, CHEN Zailong. Short-term Load Forecasting in Industrial Parks with Multi-scale Time Coding [J]. Computer and Modernization, 2024, 0(12): 59-65.
[2]	WAN Hongwei, CHEN Pinghua. Polyp Segmentation Based on Involution and Coordinate Reverse Attention [J]. Computer and Modernization, 2024, 0(11): 84-90.
[3]	ZHOU Anda, TANG Chaoying. Semantic Segmentation Algorithm for Rainy Road Scene and Its Mobile Deployment [J]. Computer and Modernization, 2024, 0(10): 7-13.
[4]	HUANG Shanshan1, WU Wei2, XU Yuqing1, WEI Jie1. Pipelines in Drawings Detection Method Based on Improved Mask R-CNN and LSD [J]. Computer and Modernization, 2024, 0(10): 42-48.
[5]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[6]	LI Xin, JIAO Linan, LIU Youquan, MA Caisha. A Video Stabilization Method Based on Improved SIFT [J]. Computer and Modernization, 2024, 0(06): 43-50.
[7]	ZHAO Wenbo1, XIANG Dong1, WANG Jiubin2, DENG Yuehui3, ZHANG Wei1, KANG Qian1, LI Yujie1. Infrared Image Segmentation of Electrical Equipment Based on Improved Slime Mould Algorithm and Tsallis Entropy [J]. Computer and Modernization, 2024, 0(06): 70-75.
[8]	FU Lingli, QIU Yu, ZHANG Xinchen. Retinal Vessel Segmentation Based on Improved U-Net with Multi-feature Fusion [J]. Computer and Modernization, 2024, 0(06): 76-82.
[9]	ZHU Fen, HE Lifeng, SUN Shuang, ZHANG Mengying, YU Jiajia. Pancreas Segmentation Model Based on Deformable Residual and Cascading Encoding [J]. Computer and Modernization, 2024, 0(06): 83-88.
[10]	QIAO Jia, XU Kun, HU Peirong. Layout Analysis Method of Multi-scale Feature Fusion [J]. Computer and Modernization, 2024, 0(05): 16-21.
[11]	JIA Ziyu1, HUANG Huan1, HU Chun’ai2, DOU Lina2. Segmentation and Reconstruction of Left Atrial Fibrosis Based on MR [J]. Computer and Modernization, 2024, 0(05): 75-79.
[12]	ZHANG Zixu, LI Jiaying, LUAN Pengpeng, PENG Yuanyuan. An Attention Mechanism-based U-Net Fundus Image Segmentation Algorithm [J]. Computer and Modernization, 2024, 0(05): 110-114.
[13]	WANG Xin, XIN Guojiang, ZHANG Yang, ZHU Lei . Tongue Image Segmentation Algorithm Based on Dilated ADU-Net in Open Environment#br# #br# [J]. Computer and Modernization, 2024, 0(04): 48-54.
[14]	LIU Xin-pin1, 2, 3, WANG Hong1, 3, ZHAO Liang-jin1, 3. Inshore Warship Detection Method Based on Multi-task Learning [J]. Computer and Modernization, 2024, 0(03): 29-33.
[15]	CUI Shao-guo, HU Guang-ping. Nested Named Entity Recognition Based on Semantic Segmentation [J]. Computer and Modernization, 2024, 0(02): 69-74.

Web Information Segmentation Method Based on DOM Structure Tree

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments